Submission to metric track

Introduction

Choosing between man- and zone coverage is one of the most important strategic decisions a defensive coordinator has to take before each offensive play in American football. While experienced coaches and quarterbacks can often identify these defensive strategies visually, the growing availability of tracking data presents another opportunity to infer the tactics by the defence. This project aims to leverage hidden Markov models (HMMs) to detect defensive strategies — man or zone coverage — based on pre-snap player movement data. By modeling hidden states that represent the offensive player being guarded, this approach builds on previous attempts to predict zone- or man coverage (see screenshot below from the Pittsburgh Steelers vs. Cleveland Browns match), which focused primarily on specific plays without motion. In contrast, we are now able to include player movement and exploit the additional information available. In this way, we provide a data-driven framework for unravelling the complexity of defensive patterns, enabling real-time tactical insights for coaching and analysis.

Coverage Prediction

Data

Analyzing tracking data from nine weeks of the NFL 2022 season, we aim to forecast the defensive scheme (man- or zone defense). For this, we use the corresponding data from PFF that analysed every play and assigned the categories , and representing the different schemes. As it is not properly described what means, we omit every play that is associated with this value. Then, we end up with XY plays in total, from which the defense played Y in zone and X in man coverage.

Within these plays, we concentrate on the tracking data after the line has been set (because we are not interested in how players come out of the huddle) and before the ball has been snapped by the Center. For the HMM analysis, we further concentrate on those plays with pre-snap motion (ZZ plays).

Feature engineering

To accurately forecast the defensive scheme (man- or zone defense) for every play, we need to create various features derived from the tracking data. In particular, we conducted the following feature engineering steps:

Analysis

Our analysis comprises different steps:

1. Pre-motion analysis

We train a model to predict whether the defense plays a man- or zone coverage scheme. In particular, …..

The model uses the previously described features, blablabla.

2. HMM analysis

We model the movements of defensive players during the phase of pre-snap motion within a hidden Markov framework, in which the underlying states represent the offensive players to be guarded (see Franks et al. 2015 for a similar approach in basketball). In contrast to Groom et al. (2024), who enforce a state to proxy zone coverage during corner kicks in soccer, we cannot proceed similarly as the classical coverage zones in American football will only be covered by the defenders post-snap.

The following video displays a touchdown from the Kansas City Chiefs against the Arizona Cardinals in Week 1 of the 2022 NFL season. We can see that, pre-snap, Mecole Hardman (KC #17) is in motion. He is immediately followed by the defender Marco Wilson (AZ #20), which is a clear indication for man-coverage.

A hidden Markov models consists of an observed time series \(\{\boldsymbol{y}_t\}_{t=1}^T\) — here, the y-coordinates of the defensive players and an unobserved first-order Markov chain \(\{ g_t\}_{t=1}^T\), with \(g_t \in \{1,\ldots,N\}\) which proxies the offensives players to be guarded at every time point \(t\). The Markov chain is fully described by an initial distribution \(\boldsymbol{\delta}=\bigl( \Pr(g_1=1), \ldots, \Pr(g_1=N) \bigr)\) and a transition probability matrix (t.p.m.) \(\boldsymbol{\Gamma} = (\gamma_{ij}),\) with \(\gamma_{ij} = \Pr(g_t = j| g_{t-1} = i), \ i,j = 1, \ldots, N\). The connection of both stochastic processes arises from the assumption that the distribution of the observations \(\boldsymbol{y}_t\) are fully determined by the currently active state, i.e.  \[\begin{equation*} f(\boldsymbol{y}_t|g_1, \ldots, g_T, \boldsymbol{y}_1, \ldots, \boldsymbol{y}_{t-1},\boldsymbol{y}_{t+1},\ldots,\boldsymbol{y}_T) = f(\boldsymbol{y}_t|g_t). \end{equation*}\] In general, \(f\) can be any density or probability mass function depending on the type of data. Following the approaches of Franks et al. (2015) and Groom (2024), we opt for a Gaussian distribution.

To remediate this, we derive the decision of man- or zone coverage from the number of switches for individual players. In particular, a low number of switches when offensive players are in motion indicates man coverage whereas a higher number indicates zone coverage.

3. Post-motion analysis

We re-train the pre-motion model to predict whether the defense plays a man- or zone coverage scheme, however, in this step, we incorporate results from the HMM analysis as further covariates.

4. Motion evaluation

By comparing the predictive performance of our pre-motion model and our post-motion model we can determine the effectiveness of motion to detect the correct defensive scheme. Moreover, we can assess which teams predominantly apply pre-snap motions to increase the likelihood of correctly identifying the applied defensive strategy.

Results

Hier die Animation rein mit den Verbindungen von den decodierten States

Discussion

Code

All code for data pre-processing, model training, prediction and player evaluation can be found here.

References

*Franks A, Miller A, Bornn L, Goldsberry K (2015). Characterizing the Spatial Structure of Defensive Skill in Professional Basketball. The Annals of Applied Statistics, 9(1), DOI:10.1214/14-AOAS799

*Groom S, Morris D, Anderson L, Wang S (2024). Modeling Defensive Dynamics in Football: A Hidden Markov Model-Based Approach for Man-Marking and Zonal Defending Corner Analysis. The 2nd International Workshop on Intelligent Technologies for Precision Sports Science

*Zucchini W, MacDonald I, Langrock R (2016). Hidden Markov Models for Time Series - An Introduction Using R. CRC Press

Appendix